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Abstract 

The  TPC!-C  benchmark  is  a  new  benchmark  approved  by  the  TPC  council  intended  for  com¬ 
paring  database  platforms  running  a  medium  complexity  transaction  processing  workload.  .Some 
key  aspects  in  which  this  new  benchmark  differs  from  the  TPC-A  benchmark  are  in  having  several 
transaction  types,  some  of  which  are  more  complex  than  that  in  TPC-A,  and  in  having  data  acce.ss 
skew.  In  this  paper  we  present  results  from  a  modelling  study  of  the  TPC-C  benchmark  for  both 
single  node  and  distributed  database  management  systems.  We  simulate  the  TPC-C  workload  to 
determine  expected  buffer  miss  rates  assuming  an  LRU  buffer  management  policy.  These  miss  rates 
are  then  used  as  inputs  to  a  throughput  model.  ^From  these  models  we  show  the  following:  (i)  We 
quantify  the  data  access  skew  as  specified  in  the  benchmark  and  show  what  fraction  of  the  accesses 
go  to  what  fraction  of  the  data,  (ii)  We  quantify  the  resulting  buffer  hit  ratios  for  each  relation  as 
a  function  of  buffer  size,  (iii)  We  show  that  close  to  linear  scale-up  (about  3%  from  the  ideal)  can 
be  achieved  in  a  distributed  system,  assuming  replication  of  a  read-only  table,  (iv)  We  examine 
the  effect  of  packing  hot  tuples  into  pages  and  show  that  significant  price/ performance  benefit  can 
be  thus  achieved,  (v)  Finally,  by  coupling  the  buffer  simulations  with  the  throughput  model,  we 
examine  typical  disk/memory  configurations  that  maximize  the  overall  price/j)erformance. 
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1  Introduction 


The  TP('  Benchmark  C  (TPC-C)  [7,  9]  is  intended  to  model  a  medium  complexity  online  transaction 
processing  (OLTP)  workload.  It  is  patterned  after  an  order-entry  workload,  with  multiple  tran.saction 
types  ranging  from  simple  transactions  that  are  comparable  to  the  simple  debit-credit  workload  in 
the  TPC-A/B  benchmarks  [6],  to  medium  complexity  transactions  that  have  two  to  fifty  times  the 
number  of  calls  of  the  simple  transactions. 

An  important  aspect  of  the  workload  is  that  is  specifies  skewed  (i.e.  non-uniform)  access  within 
individual  data  types/relations.  By  contrast,  the  TPC-A  benchmark  assumes  uniform  access  within 
each  relation/data  type.  The  skewed  access,  which  is  typical  for  many  OLTP  workloads  [4]  allows 
better  use  of  the  main  memory  database  buffer  by  allowing  it  to  capture  the  hot  data  items. 

The  benchmark  specifies  a  non-uniform  random  number  generation  function  to  be  used  for  gen¬ 
eration  of  tuple-ids.  We  provide  insight  into  the  distribution  of  this  skew  by  simulating  this  function 
as  specified  by  the  benchmark.  The  output  of  this  simulation  specifies  the  skew  at  the  tuple  level, 
yet  most  typical  DBMS’s  access  and  store  data  in  pages.  Therefore,  to  estimate  the  skew  at  the 
page  level  we  also  simulate  the  function  assur.Jng  tuples  are  packed  sequentially  into  pages.  These 
results  provide  insight  into  the  workload  and  help  explain  the  miss  rate  results  obtained  in  our  buffer 
simulations.  In  addition  we  use  the  distribution  obtained  from  this  simulation  to  guide  us  in  packing 
tuples  into  pages  so  that  all  tuples  of  similar  ’’hotness”  wiU  be  in  the  same  page. 

We  assume  the  use  of  the  LRU  buffer  replacement  policy  for  the  database  buffer  and  simulate  the 
buffer  pool  to  determine  the  expected  miss  rates  for  each  relation.  We  use  the  miss  rates  obtained 
from  our  buffer  simulations  as  inputs  to  a  throughput  model.  Using  this  model,  we  explore  optimal 
buffer  sizes  to  minimize  hardware  costs.  Finally,  we  consider  the  impact  of  running  the  benchmark 
on  a  clustered/distributed  database  system,  examining  the  impact  of  replicating  one  of  the  read-only 
relations. 

We  focus  only  on  the  access  patterns  and  processing  requirements  of  the  benchmark.  We  do  not 
consider  terminal  emulation,  ACID  properties,  or  pricing.  When  we  present  price/performance  curves 
we  will  only  consider  hypothetical  costs  of  hardware  and  do  not  include  considerations  such  as  terminal 
emulation  or  software  maintenance  costs  as  outlined  in  the  TPC-C  specification  [9].  We  describe  the 
benchmark  transactions  only  in  the  level  of  detail  required  to  model  the  workload,  primarily  in  terms 
of  the  access  patterns  and  the  number  of  databa.se  calls  per  tran.saction.  Readers  interested  in  details 
such  a.s  which  fields  are  retrieved  and  updated  are  referred  to  the  benchmark  specification. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2  we  provide  a  synopsis  of  the  fl’C-f’ 
workload,  sf)  that  the  pajier  is  reasonably  self  contained.  In  Sectioti  4  we  |)r<'seiit  simulation  results 


for  the  non-uniform  random  number  generation  routines  to  determine  the  degree  of  acress  skew. 
A  description  of  our  buffer  model  simulation  including  model  results  is  contained  in  Section  1,  .-X 
throughput  model  and  price/perfonnance  results  for  both  a  single  and  a  distributed  system  are  given 
in  Section  5.  Concluding  remarks  appear  in  Section  6. 

2  TPC-C  Workload  Synopsis 

This  section  gives  a  summary  of  the  TPC-C  workload.  For  a  more  thorough  treatment  see  the 
TPC  C  specification  [9]  and  overview  [7].  In  this  paper,  we  focus  only  on  the  access  f)atterns  artd 
processing  requirements  of  the  benchmark.  For  concreteness,  we  will  assume  a  relational  database 
model,  though  most  of  the  development  is  applicable  to  other  data  models.  We  first  give  an  overview 
of  all  five  transaction  types  in  the  benchmark  and  then  give  a  more  detailed  account  of  each  of  the 
transactions  in  the  following  section. 

2.1  TPC-C  Overview 

The  TPC-C  benchmark  is  intended  to  represent  a  generic  wholesale  supplier  workload.  The  workload 
is  primarily  a  transaction  processing  workload  with  multiple  SQL  calls  per  transaction,  but  also  has 
two  aggregates,  one  non-unique  select,  and  a  join.  The  workload  specifies  skew  (i.e.  non-uniform 
access)  at  the  tuple  level  for  three  of  the  relations. 

Figure  1  shows  the  Business  Environment  Hierarchy  of  the  TPC-C  workload.  This  figure  is  a 
reproduction  of  that  found  in  the  TPC-C  benchmark  specification  [9].  The  overall  database  consists 
of  a  number  of  warehouses.  Each  warehouse  is  compo.sed  of  ten  districts  where  each  district  has 
3,000  (3K)  customers.  There  are  lOOK  items  that  are  stocked  by  each  warehouse.  The  stock  level 
for  each  item  at  each  warehouse  is  maintained  in  the  Stock  relation.  Customers  place  orders  that  are 
maintained  in  three  relations:  in  the  Order  relation  a  permanent  record  of  each  order  is  maintained;  in 
the  New-Order  relation,  pending  orders  are  maintained  and  later  deleted  by  a  Delivery  transartion;  in 
the  Order- Line  relation,  an  entry  is  made  for  each  item  ordered.  A  history  of  the  payment  transartion 
is  appended  to  the  History  relation. 

The  logical  database  design  is  composed  of  9  relations  as  listed  in  tairle  1  and  shown  in  Figure 
2.  In  the  table,  W  represents  the  number  of  warehouses.  W'e  make  the  assumption  that  oidy  integral 
units  of  tuples  fit  per  page.  The  cardinality  of  the  Warehon.se,  District,  (bistonier.  and  Stork  relations 
scale  with  the  number  of  warehoii.ses.  This  is  similar  to  the  TP(SA  benchmark  w  here  the  cardinality 
of  the  Branch,  Teller,  and  Account  relations  scale  with  the  number  of  branches,  'l  lu'  lleni  relation 
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Table  1;  Summary  of  Logical  Database 


Relation 

Name 

Cardinality 

Tuple 

Length 

Tuples  Per 
4K  Page 

warehouse 

W 

46 

district 

W  *  10 

43 

customer 

W  ♦  30K 

6.^.')  bytes 

6 

stock 

W  ♦  lOOK 

306  bytes 

13 

item 

lOOK 

82  bytes 

49 

order 

24  bytes 

1/0 

new-order 

8  bytes 

512 

order-line 

54  bytes 

75 

history 

46  bytes 

89 

Table  2:  Summary  of  Transactions 


Transaction 

Minimum  % 

Assumed  % 

Selects 

Updates 

Inserts 

Deletes 

Non-Unique  Select 

Join 

New  Order 

★ 

43 

23 

11 

12 

0 

0 

0 

Payment 

43 

44 

4.2 

3 

1 

0 

0.6 

0 

Order  Status 

4 

4 

11.4 

0 

0 

0 

0.6 

0 

Delivery 

4 

5 

130 

120 

0 

10 

0 

0 

Stock  Level 

4 

4 

0 

0 

0 

0 

0 

1 

does  not  scale  with  the  number  of  warehouses.  The  Order,  Order-Line,  and  History  relations  grow 
indefinitely  as  orders  are  processed. 

There  are  five  transaction  types  in  TPCC  as  listed  in  table  2.  Further  details  of  the  specific 
relations  accessed  and  the  access  skew  are  given  in  Sections  2.2  and  .3.  The  New  Order  transaction 
places  an  order  for  10  items  from  a  warehouse,  inserts  the  order,  and  for  each  item  updates  the 
corresponding  stock  level.  The  Payment  transaction  processes  a  payment  for  a  customer  and  updates 
balances  and  other  data  in  the  Warehouse,  District  and  Customer  relations.  The  customer  can  be 
specified  either  by  a  unique  customer-id,  or  by  a  name.  In  the  latter  case,  on  the  average  three 
customers  qualify  from  which  one  is  selected.  When  specified  by  customer-id,  this  transaction  is  of 
comparable  complexity  to  the  TPC-A  transaction.  The  Order  Status  transaction  returns  the  status 
of  a  customer’s  last  order.  As  in  the  Payment  transaction,  the  customer  may  be  specified  by  the 
customer-id  or  by  name.  Each  item  in  the  last  customer  order  is  examined.  The  Delivery  transaction 
processes  orders  corresponding  to  10  pending  orders,  one  for  each  district,  with  10  items  per  order. 
The  corre.sponding  entry  in  the  New-Order  relation  is  deleted.  Finally,  the  Stock  Level  transaction 
examines  the  quantity  of  stock  for  the  items  ordered  by  each  of  the  last  20  orders  in  a  district. 

Table  2  summarizes  the  transactions  ba.sed  on  the  percent  of  the  workload  each  transaction 
comprises,  and  the  number  of  selects,  updates,  inserts,  deletes,  non-unique  selects,  and  joins  for  a 
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relational  iiiorlel.  'I'here  is  a  coluini'.  for  miniiiuitii  iioreent  of  workiond  am!  a  (niutnn  for  a^snim  d 
percent  of  workload.  The  benchmark  six'cifies  a  ininimuiii  percent  for  all  the  transaction  types  ext  ep! 
the  New  Order  transaction.  The  benchmark  metric  i.s  the  tntmber  of  New  Order  transactions  proi  e-sed 
per  minute,  hence,  it  i.s  de.sirable  to  .set  the  percent  .New  Order  as  hieh  as  possible  {  i.')'/!  )  lakine  into 
account  that  the  size  of  the  New-Order  relation  will  erow  wilhont  hotind  iitiless  the  relative  rate  of 
Delivery  transactions  is  suflident  to  delete  the  entries  in  the  .\ew-Order  relatitm  at  th<>  saim'  rate 
that  the  New-Order  transaction  inserts  them.  The  third  column  in  llte  tal)h'  is  the  percent  of  the 
workload  mix  that  wc  have  assumed  for  all  studies  in  this  paper.  We  have'  assumed  the  percetit  of 
Delivery  transaction  is  5%  to  ensure  that  the  size  of  the  New-Order  relation  remains  small  since  our 
simulations  must  niaiiitaiii  the  contents  of  the  relation  as  the  simulation  proceeds.  Note,  tlu'  percent 
New-Order  versus  Delivery  is  a  key  parameter  of  this  benchmark  and  should  1)»  tuned  carefully  to 
achieve  the  maximum  New-Order  transactions  per  second.  If  the  percent  New-Order  is  ■}'<%  and  tlie 
percent  Delivery  is  4%  then  the  New-Order  relation  will  grow  without  bound  causing  more  nusses  on 
the  New-Order  relation  to  occur  and  a  need  for  more  storage.  The  join  is  an  equi-join,  where  the  two 
relations  involved  each  have  at  most  200  tuples  that  meet  the  selection  predicate.  Further  description 
of  each  of  these  transactions  is  found  in  section  2.2. 

2.2  Transaction  Access  Patterns 

In  this  section  we  summarize  the  access  patterns  for  each  database  call  of  each  transaction.  For 
each  transaction  we  first  list  how  the  random  variables  are  generated,  and  then  list  the  database 
operations  made  by  that  transaction  in  a  simplified  pseudocode.  Although  our  pseudocode  is  not  in 
SQL  it  succinctly  conveys  the  function  of  each  transaction.  The  TPC-C  specification  includes  sample 
code  [9]  for  each  transaction.  In  the  description  of  how  the  input  data  is  generated  many  of  the 
tuple-ids  are  generated  from  the  NU()  function.  We  define  and  simulate  thi.s  function  in  section  IT 
for  now  just  view  this  as  a  non-uniform  distribution. 

New  Order  Transaction 

This  transactions  places  an  order  that  consists  of  an  average  of  10  items.  The  input  is  generated 
as  follows: 

whouse-id  uniform 

dist-id  uniform 

customer-id  NU(  1023,1 ,3000) 

number  of  items  uniform(5,10) 

item-id  NU{8 19 1,1, 100000) 

The  benchmark  specifies  that  there  are  10  districts  per  warehouse,  and  eacli  district  lia.s  one 

terminal.  All  transaction.s  initiated  by  a  terminal  use  that  terminal's  district  and  wareiionse  ninnbrr. 
Since  we  are  not  explicitly  modelling  the  terminals,  we  assume  the  wliouse-id  ami  dist-id  are  uiiifornily 
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distributed.  This  assumption  is  reasonable  since  each  terminal  is  submitting  requests  at  the  same 
rate. 

Below  we  list  the  simplified  format  of  the  New-Order  transaction: 

1.  Select(whouse-id)  from  Warehouse 

2.  Select(dist-id,  whouse-id)  from  District 

3.  Update(dist-id,  whouse-id)  in  District 

4.  Select(customer-id,  dist-id,  whouse-id)  from  Customer 

5.  Insert  into  Order 

6.  Insert  into  New-Order 

7.  For  each  item  (10  items): 

(a)  Select(item-id)  from  Item 

(b)  Select(item-id, whouse-id)  from  Stock 

(c)  Update(item-id, whouse-id)  in  Stock 

(d)  Insert  into  Order- Line 

8.  Commit 

In  the  benchmark,  a  district  is  associated  with  a  specific  warehouse,  hence,  the  key  used  to 
uniquely  identify  a  district  tuple  is  composed  of  two  fields:  (dist-id,  whouse-id).  Similarly,  the  key- 
used  to  uniquely  identify  a  customer  tuple  is  composed  of  three  fields:  (customer-id,  dist-id,  whouse- 
id).  In  the  benchmark  the  number  of  items  ordered  is  uniformly  distributed  between  5  and  15.  We 
assume  ah  transaction  have  a  fixed  number  of  items  ordered  equal  to  10.  This  assumption  also  has  no 
effect  on  our  results  since  we  only  report  mean  miss  rates  and  throughputs.  For  each  of  the  10  items 
ordered,  the  supplying  warehouse  is  the  local  warehouse  09%  of  the  time  and  uniformly  distributed 
among  all  the  other  warehouses  1%  of  the  tune.  The  implication  of  a  having  a  remote  warehouse 
involved  is  that  the  tuple  retrieved  from  the  stock  relation  may  be  on  a  different  node  if  the  warehou.se 
is  remote  and  the  database  is  configured  across  a  distributed  system.  We  will  assume  that  calls  to 
remote  warehou.ses  located  on  the  .same  node  incur  the  same  overhead  as  a  call  to  the  local  warehouse. 
To  uniquely  identify  a  stock  tuple  the  key  has  two  fields:  (item-id,  whouse-id).  A  specific  Stock  tuple 
contains  the  number  of  that  particular  item  in  stock  at  that  particular  warehouse.  In  addition,  the 
benchmark  .specifies  that  1%  of  the  transactions  should  be  rolled  back  to  simulate  entry  errors.  We 
ignore  this  aspect. 

Payment  Transaction 

This  transaction  processes  a  payment  by  one  of  the  customers.  There  are  two  ca.ses.  In  the  first 
case,  which  occurs  40%  of  the  time,  the  customer  is  selected  by  customer-id.  In  the  .second  case. 


wilicli  ocnirs  GO/^  of  llio  tinip,  tho  cvisloiiH'r  is  soloct.f'd  by  l:ist  ii;uin'.  Dm-  id  i  1h>  tnctlioil  specified 
by  ihe  i)PU(lniiark  for  the  population  of  tiu'  database  (eaeli  <li.stri<'t  lias  lUKKI  rustiitners  but  only 
lOOO  names),  on  average  three  cii.stomers  will  have  the  same  last  name.  ili(>  actual  cusloiiier  <  hnscn 
is  determined  by  selecting  all  customers  with  that  name,  sorting  on  the  first  name,  and  taking  the 
middle  one.  To  define  the  accesses  to  the  relation  we  will  assume  that  this  non-unique  select  has  the 
same  overhead  as  3  selects. 

Regardless  of  the  method  used  for  selecting  the  customer,  15%  of  the  transactions  assttme  the 
customer  is  paying  through  a  warehouse  other  than  the  customer's  home  warehouse.  The  input  is 
generated  as  follows: 

whouse-id  uniform 

dist-id  uniform 

case  1  customer-id  NU(1023, 1.3000 

case  2  customer-name  NU(25.5,lbound,ubound) 

Note  that,  in  case  two,  the  customer  name  is  drawn  from  the  NU  funtion  from  Ibound  to  ubound.  We 
assume  one  of  three  (Ibound, ubound)  pairs  are  chosen  with  equal  probability  as  ( 1 ,1000).  ( 1001 .2000). 
(2001,3000).  In  actuality  there  are  1000  unique  names  per  district  and  the  remaining  2000  names  are 
uniformly  drawn  from  these  1000  names.  Hence,  when  a  customer  is  specified  by  name  on  average 
three  tuples  satisfy  the  predicate  and  are  distributed  across  the  3000  tuples  in  some  manner  similar 
to  above.  We  have  chosen  the  distribution  above  to  keep  the  simulations  simple.  Below  we  list  the 
SQL  calls  made  by  the  transaction  in  a  simplified  format. 

1.  Select(whouse-id)  from  Warehouse 

2.  Select(dist-id, whouse-id)  from  District 

3.  (a)  Case  I;  Select(customer-id,dist-id, whouse-id)  from  Customer 

(b)  Case  2;  Non-Unique  Select(customer-name, dist-id, whouse-id)  from  Customer 

4.  Update( whouse-id)  in  Warehouse 

5.  Update(dist-id, whouse-id)  in  District 

6.  Update(customer-id, dist-id, whouse-id)  in  Customer 

7.  Insert  into  History 

8.  Commit 

Order  Status  Transaction 

This  transaction  determines  the  status  of  a  customer’s  last  order,  returning  information  about  the 
customer,  and  a  summary  of  the  order.  The  customer  is  determined  as  in  the  Payment  transartion, 
i.e.  f)0%  of  the  time  by  name  and  40%  by  customer  id. 
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whouse-id  uniform 

dist-id  uniform 

case  1:  customer-id  NU(  1023,1 ,3000) 

case  2  customer-name  NU(255,lbound,ubound) 

1.  (a)  Case  1:  Select(customer-id, dist-id, whouse-id)  fconi  Customer 

(b)  Case  2;  Non-Unique  Select(customer-naine, dist-id, whouse-id)  from  Customer 

2.  Select(Max(order-id), customer-id)  from  Order 

3.  for  each  item  in  the  order: 

(a)  Select(order-id)  from  Order- Line 

4.  Commit 

The  database  call  ”Select(Max(order-id), customer-id)  from  Order”  is  the  selection  of  the  tuple  in 
the  Order  relation  that  is  the  most  recent  order  placed  by  the  customer.  This  could  be  implemented 
as  a  max  aggregate,  or  an  order  by  descending  order-id  and  return  only  the  first  tuple.  Since  the 
Order  relation  keeps  on  growing  without  bound,  both  of  these  approaches  could  be  expensive.  This 
could  be  implemented  using  an  ordered  multi-keyed  index  so  that  correct  tuple  can  be  fetched  in  just 
one  index  look  up.  Hence,  in  our  studies  we  assume  this  requires  the  overhead  of  a  single  select. 

Delivery  Transaction 

This  transaction  processes  a  delivery.  The  transaction  assumes  that  during  a  delivery  the  oldest 
order  not  yet  delivered  for  each  district  within  a  warehouse  is  processed.  Hence,  there  are  really  10 
deliveries  per  delivery  transaction.  The  benchmark  specifies  that  this  transaction  has  less  stringent 
response  time  constraints  afid  can  be  executed  in  batch  mode,  i.e.  deferred  execution.  The  only 
input  to  the  transaction  is  the  whouse-id  which  is  uniformly  distributed.  The  transaction  proceeds 
as  follows: 

1.  For  each  district  within  the  warehouse  (i.e.  ten  times): 

(a)  Select(Min(order-id),whouse-id, dist-id)  from  New-Order 

(b)  DeJete(order-id)  from  New-Order 

(c)  Select(order-id)  from  Order 

(d)  Update(order-id)  Order 

(e)  For  each  item  in  the  order  (i.e.  ten  times); 

i.  Select(order-id)  from  Order-Line 

ii.  Update(order-id)  Order-Line 

(f)  Select(customer-id)  from  Customer 

(g)  Update(customer-id)  Customer 

2.  Commit 
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rite  tlatabast'  cal!  ''Selt'ctlMintoHler-itij.whiMiM' i<l  I  Irom  Ncv, Order  i-  iln'  eie,  ti.,;.  .,1 
the  tuple  in  the  Nevv-Orcler  relation  that  is  the  oldest  order  for  that  di'tiict  and  uaiehoa  e  in  tin- 
New-Order  rfdatioii.  As  in  the  Max  select  in  the  Order  Stains  transaction,  this  eonhl  lie  iiupteinenieil 
using;  a  niiilti-keyed  index  so  that  thi'  correct  tuple  c;ui  he  letched  in  just  one  call.  I  he  cin-ioine;  id 
used  in  the  Select  from  Custotiier  is  obtained  from  the  tujde  in  the  Order  relation. 

Stock  Level  Transaction 

Thi.s  transaction  determines  the  number  of  items  sold  by  orders  from  the  last  20  order-  o|  a 
specific  di.strict  that  liave  a  .stock  level  below  a  certain  threshold.  I'he  inputs  ar*'  *!'e  di-t  id.  winch 
is  uniformly  distributed,  and  the  threshold.  Below  we  (piote  the  samph'  SQL  code  direitly  from  the 
tpcc  document  [9]  so  that  we  do  not  confuse  the  (piery  by  oversimplification. 

SELECT  djiext-o_id  INTO  ;oJd 
FROM  District 

WHERE  d-wJd  =  :w'Jd  AND  dJd  =  :dJd  ; 

SELECT  COUNT(DISTINCT  (sJJd))  INTO  rstock.count 
FROM  Order-Line,  Stock 
WHERE 

oLwJd  =  .wjd  AND 
oLdJd  =  ;d_id  AND  oLoJd  <  :oid  AND 
oLoid  >  (;oid  -  20)  s.wJd  =  :wJd  AND 
sJJd  =  oLi_id  AND  s.quantity  <  threshold  ; 

In  the  above  query,  oLdJd  specifies  the  dist-id  attribute  of  a  tuple  in  the  ordorliiie  relation.  oJd  is 
the  order-id  attribute,  iJd  is  the  item  attribute,  and  wJd  is  the  warehouse  attrilmte.  The  first  sehui 
acquires  the  current  order  number  for  the  district  and  jilaces  it  in  the  variable  which  s'acils 

for  order-id.  Having  obtained  the  current  order-id  for  that  district,  the  ipiery  comimtes  a  join  of  the 
Order-Line  and  Stock  relatkms  to  find  the  number  of  distinct  items  orih-red  in  the  district's  la>t  20 
orders  which  have  a  stock  quantity  below  the  specified  threshold. 

Assuming  an  index  on  the  order-id  field  of  the  Order-Line  relation  and  a  two  keyed  iiide.x  on  the 
whouse-id  and  item-id  of  the  stock  relation,  the  ipiery  results  in  an  average  ot  20()  Oniii-i.iiK  .iiid 
Stock  tuples  each  being  fetched. 

To  summarize  the  access  patterns  of  the  five  transaction  we  list  the  number  ol  ac  cesses  to  e.irli 
relation  for  each  transaction  type  and  the  average  mimher  of  ai  cesses  per  transaction  in  I  able  2;  the 
latter  assumes  the  percentages  for  each  transaction  listed  in  Lahle  2.  Within  the  table,  the  notation 
(!(x)  signifies  that  x  tuples  are  chosen  Uniformly  from  the  relation.  A  f  (.r)  dmioti's  Nonl  iiiforni 
random  selection  of  x  tuples  using  the  AT’  function.  .'I(.r)  denotes  ./■  tuples  are  Appemh'd  to  the 
relation,  and  P{x)  denotes  x  tuples  are  chosen  where  the  tuples  chosen  were  recetitly  by 

Past  behavior  (in  other  words  there  is  a  form  of  temporal  locality).  Note  that  the  tuples  aen's-ed 
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Table  .1:  SiM'^marv  of  Relation  Accesses 


Relation  j  .New 

1  Order 

Payment 

Order 

Status 

Delivery 

.Slock 

Level 

Average 

warehous'’ 

U(l) 

fdl) 

r  . . . 

0.H7 

district 

P(i) 

O.'Ti 

rustonier 

NU(l) 

on 

NU(2.‘2) 

P{iU) 

I..V21 

stock 

NVflO) 

■gi 

P(20U) 

12.-1 

item 

NLfiO) 

4.4 

order 

A(l) 

P(IO) 

new-order 

P(10) 

0.4!) 

order-line 

P(IOO) 

P(200) 

1.1.3 

history 

A(l) 

0.43 

by  the  Or  ler-Status,  Delivery,  and  Stock-Level  transactions  are  more  likely  to  be  buffer  pool  hits 
since  they  are  for  tuples  that  have  been  recently  put  in  the  buffer  pool  by  the  New-Order  transaction. 
Many  of  the  tuple-ids  are  generated  from  the  NU(  )  function.  We  define  and  simulate  this  function  in 
the  next  section. 

3  Analysis  of  Data  Access  Skew  in  TPC-C 

The  TPC-C  benchmark  assumes  access  to  the  tuples  are  skewed,  i.e.  within  a  relation  some  tuples 
are  referenced  more  frequently  than  others.  In  this  section  we  define  and  simulate  the  non  uniform 
random  number  function,  as  .specified  by  the  TPC'-C  dorument.s,  used  for  the  generation  of  tuple 
id's.  The  non-uniform  random  number  generating  function,  .\’l’(),  which  we  paraphrase  from  the 
benchmark  specification  [9j,  is  defined  as  follows: 

N  (i{A.  T,  ij]  -  {{{ran(l{{],  A)  |  r(iiKl{x,y)]  +  ('Y/iifj  ~  x])  x  (1) 


where; 


•  randoiiif  x.v )  denotes  a  uniformly  distributed  iniegi'c  random  number  in  the  clii.'-ed  interval 

•  (’  is  a  constant  within  [()...A]. 

•  A  is  a  constant  chosen  according  to  the  size  cif  i{|e  range  [x..y], 

•  (N  7i.  .M)  stands  for  N  modulo  M, 

•  and  (N  j  .VI)  .stands  for  the  bitwise  If)gical  OR  of  .V  and  .M. 
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I'or  t ho  roinainilor  of  tliis  [lapi'r  wo  assuiiio  ('  0([iiaL-,  zoro  (tlio  1  htaiidanl  (ioi  uiiii'iit  ailow-, 

an  arbitrary  clifiKa'  of  {'  witliin  [()...\]),  Wo  choose  A  and  y  according,  tc,i  tli<'  spociiic  ations  lot  liio 
tifldi'  id  !)oiii<>  ociioratoti. 

First  wo  coiisidor  accossos  to  th<'  stock  and  item  rolalions.  Ail  Itiplo  id's  for  accossiiiii  those 
rolatioiis  are  drawn  from  the  Nl,'(Slf)l,l  ,100000)  distribution.  In  I'iguro  :{  wo  [dot  the  probability 
mass  fiinctitm  (I’MF)  for  thi.s  (Fistrihulion  as  obtained  from  simulating,  one  billion  samples.  1  he  [dot 
sin.ws  the  non  iinifortnity  in  across  arid  tht'  ])eriodicity  of  the  access  [jrobability  in  th('  first  [tarametru 
(MOI  j  of  the  .\r  futiction  above.  Tlie  number  of  cycles  equals  tiie  |  floor  of  the)  third  [taranieter 
divided  by  the  first  parameter  of  the  NU  function,  or  12  cycles  for  this  ca.se.  In  the  .•V[)pei!di.x  we 
shf>w  that  if  the  third  parameter  of  the  NU  function  is  a  power  of  two,  then  these  cycles  are  exact, 
and  we  derive  a  closed  form  expression  for  the  resulting  I’.MF.  Figure  d  is  hard  to  interpret  because 
of  the  large  number  (100.000)  of  ()oint.s;  hence,  we  plot  the  same  di.^tribiition  for  tuples  1  to  10.000 
in  Figiire  4.  In  this  figure,  the  non-uniformity  within  a  cycle  (bl91  point.s)  is  clear. 

While  the  non-uniformity  of  access  is  apparent  in  Figure  4,  the  degree  of  skew  is  not  clear.  Let 
o,  be  the  probability  of  accessing  tuple  f.  Let  be  the  fraction  of  the  relation  represented  by  that 
tuple.  .Note  ,i,  =  V  i.j  for  stock  tuples.  In  Figure  5  we  order  the  tuples  by  increasing  order  of 
o  (increasing  order  of  hotness)  and  plot  versus  cunmlative  probability  of  access 

versus  the  cumulative  fraction  of  the  relation.  If  a  relation  has  no  skew  the  curve  would  be  linear, 
hence  the  more  convex  the  curve  is,  the  more  skew  there  is.  For  the  moment  ignore  the  top  two 
curves,  and  focus  on  the  lower  curve  which  represents  the  access  skew  at  the  tuple  level.  The  graph 
.show.s  that  16%  of  the  acces.ses  go  to  about  80%  of  the  tuples,  or  alternatively,  84%  of  the  accesses  go 
to  about  20%  of  the  tuples.  There  is  even  more  skew  in  the  tail  of  the  distribution,  so  that  about  71%. 
of  the  accesses  go  to  about  10%  of  the  (hottest)  tuples  and  about  39%  of  the  accesses  go  to  about  2% 
of  the  (hottest)  tuples. 

In  most  typical  databases  data  is  stored  in  pages,  hence  w-e  need  to  determine  the  skew  at  the  page 
level.  We  first  assume  tuple.s  are  packed  into  pages  in  sequential  order  with  the  maximum  number  of 
whole  tuples  that  fit  per  page.  We  assume  the  remainder  of  the  page  is  wasted.  For  the  stork  relation 
13  (26)  tuples  fit  in  each  4K  (8K)  page. 

.Again,  w'e  order  the  pages  by  frequency  of  access  and  plot  the  cumulative  probability  of  access 
versus  the  cumulative  fraction  of  the  database  in  Figure  5  (top  two  curves).  The  top  curve  is  tor  an 
8KByte  })age  size  and  the  second  curve  i.s  for  a  4KByte  page  size.  For  a  iKBvte  page  size,  we  see 
that  2.5%  of  the  access  gcj  to  80%  of  the  data,  or  viewed  the  other  way  75%  of  the  accesses  go  to 
20%  of  the  data.  This  is  similar  to  the  so  called  ■'80-20'’  rule  where  80%  of  th(>  accesses  go  to  207(  of 
the  data.  Again,  there  i.s  a  more  sk('w  in  the  tail  of  the  distribution  azui  about  59%  of  (he  accesses 
go  to  about  10%  of  the  hottest  pages,  and  about  28%  of  the  accesses  go  to  about  2%  of  the  pages. 
The  smaller  page  size  results  in  more  skew'  than  the  larger  page  size  -inre  there  is  less  of  a  <  hanco  In 
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spread  out  the  hot  tuples  among  the  pages 

The  milder  skew  at  the  page  level  leads  to  the  question  of  whether  the  tuple  level  sk<>w  rar  In- 
obtained  at  the  page  level.  Flacking  tuples  into  pages  in  sequential  order  spreads  out  hot  tuples 
among  all  the  pages  of  the  relation.  A  sitnple  optimization  is  to  first  sort  the  tuples  from  hottest  ii, 
coldest  and  then  pack  them  into  pages  in  that  order.  Since  the  distribution  [taranieier.s  for  l  l’C  (' 
are  know  a  priori  and  are  static  in  time,  this  could  be  done.  (In  this  context  we  note  tliat  the  I  PCC 
standard  (Clause  1.4.1)  allows  clustering  of  tuples  within  pages.)  This  technitpie  would  also  work  for 
any  workload  v/bere  we  know  the  distriliution  of  accessing  tuples  within  the  relations  of  the  databa.o-. 
and  where  the  distribution  does  not  vary  with  time.  (Wo  note,  however,  that  in  many  real  workloads, 
while  there  is  considerable  skew  in  data  access,  the  access  distribution  is  often  not  static  in  time.) 
The  bottom  curve  in  Figure  5  is  the  resultant  skew  when  this  optimized  packing  of  tuples  is  used, 
and  Ls  virtually  indistinguishable  from  the  tuple  level  skew.  Hence,  the  optimized  packing  results  in 
more  skew  at  the  page  level  which  should  result  in  lower  mi.ss  rates  in  the  btiffer  pool.  As  a  further 
note,  this  optimized  tuple  to  oage  packing  approach  was  insensitive  to  page  size. 

Accesses  to  the  item  relation  exhibits  a  similar  skew  except  there  is  less  skew  for  the  non-optimized 
packing  approach  since  49  (99)  tuples  fit  per  4K  {8K)  page. 

Access  to  the  customer  relation  ’s  less  skewed  than  the  stock  and  item  relations  since  tuples 
are  accessed  by  both  tuple-id  and  customer-name.  Hence,  there  are  two  different  access  patterns 
which  are  superimposed  upon  the  relation.  If  the  customer-id  is  used  as  the  selection  key.  one 
tuple  is  selected  from  the  NU(  1023,1,3000)  distribution.  If  the  customer-name  is  used,  we  make 
the  simplifying  assumption  that  the  customer  name  is  selected  fron,  one  of  the  NT{2.')5.1 ,1000), 
NU(25-5, 100 1,2000)  and  NU(2.5.5, 2001 ,3000)  distributions  with  equal  probability.  Hence.  a,s  can  be 
derived  from  the  transaction  access  patterns  as  specified  in  section  2.2,  41.86'5f  of  the  accesses  to 
the  customer  relation  use  the  NU(1023, 1,3000)  di.stribution  and  58.14%  are  divided  equally  among 
NU(255, 1, 1000),  NU(25.5, 1001,2000),  and  NU(2001, 3000)  distributions.  In  Figure  6  we  plot  the  F’MF 
for  the  customer  relation  and  in  Figure  7  w'e  plot  the  versus  ^/i,.  We  note  that  there  is 

consideraoly  less  skew  for  the  customer  relation  than  for  the  Stock  relation. 


4  LRU  Buffer  Simulation 

In  this  section  we  outline  our  buffer  simulation  model  and  pre.sent  miss  rates  obtained  from  our  model. 
W'e  simula^^'d  the  buffer  pool  for  the  benchmark  assuming  an  LHT  replacement  policy.  We 

hypothesize  that  more  sophisticated  replacement  poiiries  could  result  in  an  evtui  larger  difference 
bf'tween  optiiriizefi  [)arking  r)f  tuples  and  noii-oj)timized  j)acking  of  tuples  since  they  should  be  able  to 
capitalize  more  on  the  access  skew.  In  otir  simulations  we  collected  confidence  intervals  using  batch 
means  with  30  batches  per  simulation  and  a  batchsize  f»f  !()(), 000  samples.  All  results  (i.e.  the  miss 
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rates  of  each  relation)  have  coufidena'  intervals  of  o'X  of  less  at  a 


|•oIlfid<'ll(  li'Vel. 


In  the  buffer  model,  we  simulate  transaetiotts  entering  the  s'  stem  seqiietitially.  and  do  nut  lon-  ide! 
the  case  where  multiple  transartions  may  be  in  the  systiuu  at  the  same  limi'.  1  h<  presence  uf 
concurrent  transactions  does  not  change  the  buffer  hit  ratio  sigtiificanlly  becausi>  the  fraction  of  pa«aes 
accessed  by  any  transaction  is  small  compared  to  tlie  buffer  size.  W’e  include  concurrent  tiansactions 
in  the  throtighput  model  in  Section  o.l.  When  a  transaction  enters  it  is  chosen  as  one  of  the  five  i  vpes 
according  to  the  distribution  for  each  type.  Each  transaction  generates  tuple  reipiests  and  itisert.s  as 
specified  in  Section  2.2.  The  simulation  keeps  track  of  the  last  order  placed  by  each  customer,  the  last 
20  orders  for  each  district,  and  which  tuples  are  in  the  New-Order  relation.  This  information  is  u-^ed 
by  the  the  Order-Status,  Delivery,  and  Stock-Level  transactions.  The  output  from  tiie  simulation  is 
the  miss  rates  for  each  relation  summed  over  all  transaction  types,  and  also  the  miss  rates  for  the 
accesses  by  the  Order-Status,  Delivery,  and  Stock-Level  transactions  in  isolation  to  Ite  used  as  inputs 
for  the  throughput  model. 

In  Figure  8  we  plot  the  miss  rates  versus  the  buffer  size  for  the  Stork.  Oustomer.  and  Item 
relations.  The  other  relations  all  have  significantly  lowe.  miss  rales.  We  include  curves  for  both  tlm 
sequential  packing  of  tuples  into  pages  and  the  optimized  packing  of  tuples.  The  curves  are.  from 
top  to  bottom,  the  Customer  relation.  Stock  relation,  and  Item  relation.  For  each  of  the  relations, 
the  optimized  packing  of  tuples  results  in  significantly  lower  miss  rates.  There  are  two  reasons  why 
the  Customer  relation  exhibits  a  larger  miss  rate  than  the  Stock  relation  even  though  the  Cu.s  omer 
relation  is  the  smaller  of  the  two.  The  first  is  that  the  customer  relation  has  less  skew  as  show  in 
Section  3.  The  second  is  that  the  stock  relation  is  accessed  more  frequently  as  show  in  table  3. 
The  item  relation  has  a  much  lower  miss  rate  since  the  relation  is  much  smaller  than  the  stork  and 
customer  relations  due  to  the  fact  that  the  item  relation  does  not  scale  with  the  number  of  war(>houses. 

The  optimal  packing  approach  results  in  significantly  lower  miss  rates  than  the  seqiuuitial  packii.g 
approach.  For  example,  the  miss  rate  for  the  stork  relation  for  a  buffer  size  of  .32M  is  30'X  lower  in 
absolute  terms  for  the  optimized  packing  approach  than  for  the  secpienlial  approach.  The  mi^s  rate 
for  the  stock  relati(m  averaged  over  all  buffer  sizes  consich'red  is  Ki'.if  lover  in  absolute  terms  for 
the  o|)tiniized  parking  approach  than  for  the  sequential  approadi.  1  hi--  significantly  lower  miss  rate 
translates  directly  t<r  a  iowt'r  I/O  rate,  and  heme  better  frerformame.  Similar  im[iro\emetit>  are  >een 
for  the  Customer  relatirm  mi.ss  rates  and  to  a  lt>ss»'r  extent  for  the  Item  la-lation. 

We  assume  20  Warehouses  at  a  node.  '1  he  reason  for  choosing  the  <  as('  of  20  \\arehoime>  relate,, 
to  the  throughput  model  in  Section  0,  where  it  is  estitiiafed  that  about  20  Warehouses  could  lie 
sujjported  by  a  10  MIPS  jtrocessor.  Heyonrl  a  sufficiently  large  nuiidx'r  of  wan'hoiisi's.  ti)e  bulfi'i 
hit  characteristics  airproximately  scale  with  the  number  of  Warr'houses.  Tlx'  reason  that  the  sialing 
is  not  exact  is  that  the  item  relatirm  doe.s  not  scale  with  the  numlx'r  of  Wan-house^,  Imt  it'>  ellru  t 
rliminishes  w'ith  an  incri'asf'  in  the  numbei  of  Warehou.sr's.  3  lu'  Wareliouse  and  Distrir  t  rehitioiis  an- 
suffiriently  srnaO  that  they  fit  in  the  buffr-r  (miss  rate  0%)  for  all  simulations  (umsirh'red. 
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Table  4:  Throughput  Model  Suintuary  :  Single  Node 
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5  System  Model  and  Performance  Estimates 

5.1  Throughput  Model  Description 

In  this  section  we  describe  our  throughput  model.  The  parameter  values  used  in  the  model  are 
similar  to  those  in  [3,  5];  they  do  not  reflect  any  particular  system,  but  are  intended  to  be  somewhat 
representative.  The  objective  is  to  identify  trends  rather  than  providing  specific  throughput  or  price- 
performance  estimates.  Our  model  incorporates  both  the  CPU  and  the  data  disks.  VVe  assume  that 
the  system  is  configured  with  a  sufficient  number  of  disk  arms  to  ensure  disk  arm  utilization  remains 
below  .50%  and  hence  the  CPU  is  the  bottleneck.  To  calculate  CPU  utilization  the  model  sums 
the  average  CPU  demand  per  transaction,  divides  by  the  MIPS  rating  of  the  processor,  and  then 
multiplies  by  the  throughput.  Our  primary  metric  is  maximum  throughput  which  we  obtain  by  fixing 
the  CPU  utilization  and  calculating  the  throughput.  To  calculate  the  disk  utilization  we  sum  the 
average  disk  demand  per  transaction  in  milli-seconds,  divide  by  the  number  of  disk  arms,  and  then 
multiply  by  the  system  throughput.  We  assume  that  there  is  a  .separate  log  disk. 

In  table  4  we  summarize  the  assumed  parameter  values  and  visit  counts  for  each  transaction  type 
for  a  single  node  system.  The  column  label  n  is  the  subscript  of  the  parameter.  In  the  equations 
below  we  will  use  On  to  denote  the  overhead  for  a  parameter  n  call.  VVe  define  visit  count  as  the 
number  of  times  a  transaction  requires  a  certain  operation  per  transaction  type.  The  visit  counts  are 
in  the  columns  heading  V\  . . .  V^.  We  define  V'i  j  to  be  the  visit  count  for  transaction  i  to  operation  j. 

.Most  of  the  parameters  in  the  table  are  self  evident  from  the  names  with  the  following  iio.ssible 
exceptions,  'fhe  parameter  application  for  application  (ode  laUwef'ii  S(Ji.  (alls,  the  paiami'ter 
snid/vf  ccim  is  for  the  (T’U  overhead  at  one  node  to  s('nd  and  receivf'  a  messagi'  across  the  network. 


tlir  paramelpr  n  hast  Lochs  for  tho  reloaso  lock  portion  of  tlx'  commit  pha^<■.  i>n  jiConnuil  for 
the  prepare  to  commit  ])ortion  of  a  2  phase  commit,  and  initlO  is  the  CiT  ovt-rhead  for  inif iat in^ 
an  I/O,  Tlie  overhead  for  releasing  locks  is  obtained  by  sinnmiiig  tlie  overlK'ad  to  relea>e  read  lockN 
and  write-locks  times  the  number  of  locks  iield  by  each  transaction  typr>  weighted  by  t  1h‘  p('r(  ent  id 
the  workload  comprised  by  each  transaction  type.  V\'e  a.ssnme  an  overhead  of  !K  instructions  for 
releasing  each  lock. 

The  parameters  rnc,  ini,  ms,  mo,  and  ml  found  in  l/,f,  and  I, ,1,4,  i  E  are  the  mi.s.s  rates 

for  the  Customer,  Item,  Stock,  Order,  and  OrderLine  relations  respectively.  These  miss  rates  are 
obtained  from  the  buffer  model.  Note  that  for  completeness  we  could  have  also  included  the  miss 
rates  for  the  Warehouse,  District  and  New-Order  relations  in  the  performance  estimates,  but  these 
miss  rates  are  always  negligibly  small  and  hence  are  omitted  from  the  table. 

The  overhead  for  the  non-unique  select  is  based  on  the  fact  that  on  average  three  values  are 
returned  and  need  to  be  sorted.  The  overhead  for  the  join  is  estimated  as  follows.  On  average  there 
are  200  items  ordered  by  the  last  20  order  transactions  and  hence  a  range  scan  returning  an  average 
of  200  items  is  invoked  to  create  a  temporary  table  for  the  outer  relation.  Each  one  of  the.se  tuples 
will  join  with  exactly  one  tuple  from  the  inner  relation.  Assuming  that  appropriate  indexes  exists  on 
the  inner  relation,  each  outer  relation  tuple  requires  an  indexed  select  on  the  inner  relation.  Finally, 
the  result  must  be  sorted  to  eliminate  duplicate  items.  We  assume  the  overhead  for  the  range  scan  is 
5K  per  tuple,  the  overhead  for  the  indexed  select  is  5K  instructions  per  tuple,  and  the  overhead  for 
the  final  sort  is  40K  resulting  in  a  total  CPU  overhead  of  2040K  instructions. 

In  table  6  we  summarize  the  visit  counts  which  differ  from  the  single  node  case  for  a  distributed 
environment  when  the  Item  relation  is  replicated  across  all  nodes,  i.e.  we  include  remote  calls  and 
distributed  commits.  In  table  7  we  summarize  the  visit  counts  assuming  the  Item  relation  is  not 
replicated.  The  visit  counts  for  the  Payment  transaction  are  the  same  for  both  replication  and  no 
replication  since  the  Payment  transaction  does  not  access  the  Item  relation.  Note  that  only  the  New- 
Order  and  Payment  transactions  differ  from  the  single  node  case  since  the  other  transaction  only 
access  local  warehouses  as  specified  by  the  benchmark. 

The  notation  found  in  tables  6  and  7  is  defined  in  table  5.  The  values  for  these  terms  are  derived  in 
Appendix  1. 

We  first  expHin  the  terms  when  the  Item  relation  is  replicated,  i.e.  table  6.  In  this  case  all 
accesses  to  the  Item  relation  are  local  because  the  relation  is  accessed  read  only.  We  assume  that 
the  distributed  Concurrency  Control  (CC)  protocol  allows  retention  of  read  locks  acros.s  transactions, 
and  only  requires  a  broadcast/semicast  when  acquiring  an  exclusive  lock.  ^ 

^Such  a  distribut<;d  CC  protocol  is  optimized  for  read-only  sharing  of  replicated  data,  and  fares  poorly  when  there 
i.s  significant  write  sharing.  Many  distributed  CC  protocols  with  replication  are  optimized  for  .significant  write  sharing, 
and  consequently  are  worse  for  read-only  sharing.  .See  (I,  2]  for  a  good  summary  of  distributed  C'C  protocol.s  and  [.tj  for 
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Table  5:  Definition  of  Notation 


symbol 

meaning 

expected  number  of  calls  for  obtaining  and  updating  .slock  tuples 

RCeust 

expected  number  of  calls  for  obtaining  and  updating  customer  tuples 

expected  number  of  calls  for  ol)taining  and  updating  item  tuples 

^  stock 

expected  number  of  unique  remote  sites  that  sup[)ly  stock  tuples 

f'CIiJt 

expected  number  of  unique  remote  sites  that  supply  customer  tuples 

^■‘itern 

expected  number  of  unique  remote  sites  that  supply  item  tuples 

expected  number  of  unique  remote  sites  that  supply  item  or  stock  tuples 

^ stock 

probability  that  all  stock  tuples  are  supplied  from  the  local  warehouse 

The  visit  counts  for  four  parameters  change:  commit,  send/receive,  prepCommit.  and  inillO. 
Although  portions  of  these  overheads  actually  occur  at  the  other  nodes,  all  the  other  nodes  will  be 
using  the  modeled  node  for  remote  calls,  so  by  symmetry  we  can  sum  the  overhead  at  the  modeled 
node. 

We  first  consider  the  NewOrder  transaction.  The  only  remote  calls  are  for  retrieving  and  updating 
stock  tuples.  The  number  of  remote  nodes  involved  in  a  2  phase  commit  is  Vstock-  The  visit  counts  for 
commit  and  initIO  are  each  increased  by  U stock  since  a  commit  must  be  done  at  each  node  involved. 
The  count  for  prepCommit  is  changed  from  zero  to  U stock  +  1  -  I'stock  since  the  prepare  portion  of 
the  two  phase  commit  must  be  done  at  every  site  plus  the  coordinator  minus  the  probability  that  the 
transaction  is  purely  local.  The  count  for  send/receive  is  change  from  zero  to  4  Ustock  +  2  RC'stock 
since  we  assume  2  round  trip  messages  must  be  sent  to  each  unique  remote  node  involved  in  the  2 
phase  commit,  and  one  round  trip  message  for  each  remote  call  for  retrieving  or  updating  a  stock 
tuple.  Note  the  multiplier  is  4  for  Ustock  (2  for  RC stock)  not  2(1)  since  we  model  the  overhead  at  all 
nodes  involved  on  the  coordinator  by  symmetry  arguments. 

For  the  Payment  transaction  the  only  remote  calls  are  for  obtaining  and  updating  customer  tuples. 
The  number  of  unique  remote  sites  involved  in  a  two  phase  commit  for  the  Payment  transaction  is 
Ucust-  The  new  visit  counts  for  the  payment  transaction  are  found  in  table  6  and  are  expressed  in 
terms  of  the  expectations  expressed  above. 

We  now  explain  the  terms  when  the  Item  relation  is  not  replicated,  i.e.  table  7.  The  visit  counts 
for  the  Payment  transaction  are  the  same  as  for  the  replicated  case  since  the  Payment  transaction 
does  not  access  the  Item  relation.  The  visit  counts  for  the  NewOrder  transaction  differ  since  the  10 
retrievals  of  the  item  tuples  may  require  a  remote  call  in  addition  to  the  remote  calls  for  stork  tuples. 
The  item  tuples  are  accessed  read  only,  hence  a  2  phase  commit  is  need  only  for  those  nodes  supplying 
a  stock  tuple.  7Te  number  of  remote  nodes  involved  in  the  2  phase  commit  is  Ustock-  Thus,  the  visit 
counts  for  initIO  and  prepCommit  are  the  same  as  when  the  item  relation  is  replicated.  A  1  phase 
commit  is  necessary  at  each  node  that  supplies  an  item  tuple  but  no  stork  tuples.  Hence,  the  numher 
of  nodes  involved  in  a  1  phase  commit  is  Unr,H  =  ('st„rk^iirm  -  Ust„rk-  Relative  to  the  replicated  case, 
an  analytical  comparison  of  distributed  with  data  replication. 


Table  6;  Throughput  Model  Suuiinary  :  Multi  Node  with  Replication 


resource 

parameter 

n 

overhead 

NewOrder 

Paymiull 

CPU 

commit 

5 

.30K 

1  +  I'stock 

1  + 

CPU 

initio 

6 

.3K 

1  +  me 

+  10(mi  +  ms) 

TUj  tock 

1  +  2.2  me 

+  Cru5( 

+  Ueuj« 

CPU 

send/receive 

8 

lOK 

^^3tock 

-f2  RCstock 

CPU 

prepCommit 

9 

I5K 

Ustock  1 

Lstock 

^cust 

the  visit  count  for  send/receive  is  increased  by  RCuem  for  obtaining  item  tuples  and  by  2Uium  for  the 
round  trip  message  necessary  for  each  node  that  participates  in  a  one  phase  commit.  The  visit  count 
for  commit  is  changed  to  include  commit  overhead  at  all  remote  nodes  Usiock+Hsm  t  whether  they  be 
involved  in  a  1  phase  or  2  phase  commit. 

Let  Vi^n  equal  the  visit  count  of  a  type  i  transaction  to  the  CPU  as  a  type  n  request.  The  values 
of  Vi^n  are  obtained  from  tables  4,  6,  or  7  depending  on  whether  the  system  being  modeled  is  a  single 
node  system,  distributed  system  with  the  Item  relation  replicated,  or  a  distributed  system  without 
replication  of  the  Item  relation.  Let  A  equal  the  system  throughput  and  Oi  denote  the  fraction  of  the 
workload  from  transactions  of  type  i.  The  utilization  of  the  CPU  is  calculated  as: 


Utilcpu  = 


MIPS 


Let  DA  =  the  number  of  disk  arms.  The  utilization  of  the  disk  is  calculated  as: 

( X^!~l  Ctj  •  Pi,]4  •  Oi4  \ 


Utild 


iak 


DA 


{;i) 
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Table  7;  Throughput  Model  Summary  :  Multi  Node  No  Replication 


resource 

parameter 

n 

overhead 

NewOrder 

u, 

CPU 

commit 

5 

30K 

1  +  U,tocH-item 

CPU 

initio 

6 

5K 

1  +  me  4-  I0(mi  -1-  ms) 

+  Uj(oci: 

CPU 

send/receive 

8 

lOK 

2RCstock  +  2RCitcm 
"f"  2Uiicm 

CPU 

prepCommit 

9 

15K 

^atock  "1"  1  ^atoek 

5.2  Single  Node  Performance  Estimates 

In  this  section  we  present  our  results  for  a  single  node  system  running  the  TPC-C  benchmark,  for 
the  parameter  values  and  assumptions  given  above.  We  assume  the  MIPS  rating  of  the  processor  is 
10  MIPS.  We  obtain  the  maximum  throughput  by  fixing  the  meiximum  CPU  utilization  at  80%  and 
calculating  the  throughput  using  the  throughput  model  outlined  above.  We  then  obtain  the  number 
of  disks  needed  by  fixing  the  maximum  disk  utilization  at  50%  and  finding  the  minimum  number 
of  disks  such  that  disk  utilization  is  less  than  or  equal  to  50%.  Note  that  typical  configurations 
are  designed  so  that  the  average  disk  utilization  is  lower  than  the  50%  we  assume,  so  as  to  take  into 
account  variance  in  the  disk  load  (for  example  see  [8]).  However,  in  a  benchmark  environment  a  higher 
disk  utilization  may  be  permissible  because  of  a  smaller  variance  in  the  disk  load.  .AiU  experiments 
assume  a  4K  page  size. 

In  Figure  9  we  plot  the  maximum  throughput  in  new-order  transactions  per  minute  versu.s  buifer 
size.  The  curves  from  top  to  bottom  are  for  optimized  packing  of  tuples  into  pages  and  non-optimized 
packing  of  tuples  into  pages. 

The  maximum  percentage  difference  between  the  methods  occurs  at  a  buffer  size  of  44  megabytes 
where  the  optimized  workload  results  in  a  2.5%  higher  throughput  relative  to  the  non-optimized 
workload.  The  average  throughput  improvement  (averaged  over  all  64  buffer  sizes  plotted  in  Figure  9 
is  1.0%  relative  to  the  non-optimized  workload.  Hence,  based  on  maximum  throughput  there  is  little 
incentive  to  pack  all  the  hot  tuples  into  separate  pages  versus  just  loading  the  database  in  sequential 
order. 

In  Figure  10  we  plot  the  cost  per  transaction/minute  versus  buffer  size,  where  we  define  (X)st  a.s 
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tlip  cost  of  tlip  iiiptnory,  disks  (itichuliiig  suificu'iit  stur;*.g('  s|(;u(>  for  all  ti'latiuns).  and  t  Ik'  < 

UV  e>nphaf:ize  that  this  is  not  the  cost  ns  spccijint  hij  tin  I'i’C-C  in  nchninrk  siitc(  it  ihx  s  not  inrhnh 
software  cost,  niaintt  nance  cost,  terminal  cost,  etc.  Tin  int<  nt  is  to  tstimnh  tin  optimnl  (Intnhnsi 
memory  buffer  size  in  the  trade-off  between  memory  and  disks.  I'lio  s'oragp  cost  is  co!!:;!!!ic<i  hy 
summing  the  storage  needs  for  the  Warehouse,  District,  Customer,  Stock,  and  Item  rclalioiis  as 
specified  in  table  1.  Assuming  20  warehouses  per  node  (witich  leads  to  about  S0‘X  CIM’  utilization), 
the  space  required  is  1.1  Gbytes.  In  addition,  we  must  include  sullicient  storage  for  running  the 
benchmark  for  180  8  hour  days  as  specified  by  the  benchmark.  Each  ,\ewOrder  transaction  insi'rts  1 
order  tuple,  and  10  order-line  tuples.  In  addition  each  Payment  transaction  inserts  one  History  tuple. 
By  multiplying  the  transaction  rate  times  the  number  of  bytes  needed  for  these  inserts  we  arrive 
at  approximately  11  Gbytes  of  disk  space  per  node  needed  for  storing  these  three  relations.  This 
space  requirement  scales  linearly  with  the  throughput.  We  assume  each  3  Gbyte  disk  costs  $5000, 
the  processor  costs  $10000,  and  memory  costs  $100  per  megabyte.  Although  these  hardware  costs 
are  debatable  and  will  quickly  be  out  of  date,  they  enable  us  to  present  a  methodology  which  can  be 
used  for  determining  the  optimal  price/performance  point.  This  method  is  beneficial  in  determining 
how  much  memory  versus  disk  arms  the  system  should  be  configured  with. 

We  first  focus  on  the  bottom  two  curves  in  Figure  10.  These  two  curves  do  not  include  the  storage 
capacity  needed  for  maintaining  the  Order,  Order-Line,  and  History  relations.  The  top  curve  of  the.se 
two  is  for  a  workload  with  sequential  packing  of  tuples  into  pages,  while  the  bottom  curve  is  for  the 
case  of  optimal  packing  of  tuples  into  pages  (we  will  refer  to  this  as  optimal  packing).  The  jagged 
shape  of  the  curves  results  from  the  adding  of  memory  until  the  disk  utilization  drops  sufficiently 
to  configure  the  system  with  one  less  disk  and  still  have  a  utilization  of  less  than  50%.  The  lowest 
point  on  the  y  axis  for  each  curve  corresponds  to  the  optimal  cost/performance  point  and  shows  the 
corresponding  amount  of  database  buffer  memory.  (Note  again  that  this  is  not  the  entire  system 
cost.)  The  lowest  points  occurs  for  a  154  Mbyte  buffer  with  a  value  of  about  $139/tpm  for  sequential 
packing,  and  at  84  Mbyte  with  a  value  of  about  $107/tpm  for  the  optimal  packing  case.  Thus,  the 
optimized  packing  of  tuples  results  in  about  a  30%  improvement  of  price  performance  relative  to 
sequential  parking. 

The  top  two  curves  in  Figure  10  include  the  the  storage  capacity  needed  for  maintaining  the  Order. 
Order-Line,  and  History  relations.  In  this  case,  adding  memory  causes  the  disk  utilization  to  drop 
sufficiently  to  configure  the  system  with  less  disks,  but  the  required  storage  capacity  precludes  removal 
of  additional  disks.  A  minimum  of  4  disks  are  required  for  storage  rapacity  requirements.  The  lowest 
points  occurs  at  a  52  Mbyte  buffer  with  a  value  of  about  $167/tpm  for  sequential  packing,  and  at  26 
Mbytes  with  a  value  of  about  $T54/tpm  for  the  optimal  packing  case.  Thus,  the  optimized  parking 
of  tuples  results  in  about  an  8%  improvement  of  price  performance  relative  to  seipiential  packing. 
Put  another  way,  the  system  is  disk  bandwidth  bound  for  memory  sizes  less  than  26  megabytes  (52) 
for  the  optimized  (non-optiniized)  ca.se,  and  storage  capacity  bound  for  larger  memory  sizes.  Hence, 
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there  is  no  benefit  obtained  from  adding  additional  memory  beyond  those  points.  Note,  given  the  rate 
at  which  disk  size  is  currently  increasing  the  system  will  become  disk  bandwidth  bound  in  the  near 
future  rather  than  storage  capacity  bound,  in  which  case  the  cost/performance  difference  will  become 
closer  to  the  30%  predicted  when  storage  costs  are  not  included.  For  example,  when  a  $5000  6  flbyte 
disk  is  assumed  the  cost/performance  improvement  resulting  form  optimal  packing  is  20%.  If  a  12 
Gbyte  disk  is  assumed  the  entire  database  fits  on  one  disk  and  the  cost/performance  improvement  is 
30%. 

From  this  simple  model,  we  conclude  that  depending  on  the  disk  bandwidth  to  storage  capacity 
ratio,  the  (hardware  cost)/performance  ratio  may  be  improved  by  up  to  30%  by  careful  loading  of 
the  database,  i.e.  packing  all  hot  tuples  into  the  same  set  of  pages.  Note,  this  does  not  take  into 
consideration  the  cost  of  the  software  or  software  maintenance  which  when  all  lumped  together  will 
reduce  the  percent  difference  significantly. 

5.3  Multiple  Node  System  Estimates 

In  this  Section  w,.  jyfeseiit  uur  lesuils  for  a  multiple  node  distributed  system  running  the  TPC-C 
benchmark.  We  assume  each  node  contains  20  warehouses  and  all  data  pertaining  to  the  node 
(except  the  item  relation  in  the  non-replicated  case)  is  located  c  that  node.  We  consider  two  cases. 
The  first  case  is  when  the  item  relation  is  replicated  across  all  sues.  Since  the  item  relation  is  read¬ 
only,  replication  protocols  could  be  optimized  for  this  case  resulting  in  little/no  overhead  for  replica 
management.  Note  that  in  a  real  database  this  would  not  be  a  trivial  task  if  the  Item  relation  can 
be  changed.  The  second  case  assumes  that  the  Item  relation  is  not  replicated,  but  rather  partitioned 
equally  among  the  nodes.  In  this  case,  all  accesses  to  the  item  relation  will  incur  a  remote  call  with 
probability  where  N  is  the  number  of  nodes  in  the  system.  In  addition  a  one-phase  commit 

involving  each  node  that  supplies  an  item  tuple  is  necessary. 

In  figure  11  we  plot  the  maximum  throughput  versus  the  number  of  nodes  for  a  buffer  size  of  102 
Mbytes.  We  only  plot  results  for  the  optimized  packing  model;  results  for  the  non-optimized  model 
are  similar.  The  top  curve  is  for  comparison  purposes  only,  and  repre.sents  a  perfectly  linear  growth 
in  maximum  throughput  with  the  number  of  nodes.  The  second  curve  is  for  the  ca,se  where  the  Item 
relation  is  replicated,  and  the  third  curve  is  for  the  case  where  the  Item  relation  is  not  replicated. 

The  benchmark  scales  almost  linearly  when  the  Item  relation  is  replicated.  This  excellent  scaleiip 
occurs  because  only  10%  of  the  New-Order  transactions  and  1.5%  of  the  Payment  transactions  involve 
a  remote  warehouse.  When  the  Item  relation  is  not  replicated  the  benchmark  does  not  scale  as  well 

since  each  New-Order  transaction  must  make  10  remoted  calls,  one  for  each  item  ordered.  The 

replicated  case  has  a  10,  30,  and  39%  higher  throughput  than  the  non-replicated  case  for  2,  10.  and 
.30  nodes  respectively.  Hence,  if  the  benchniark  is  to  be  run  on  a  distributed  system,  replication  of  the 
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Item  relation  v.ill  greatly  improve  ''stem  performance.  should  (‘mphasize  that  this  assumes  tlse 
use  of  a  concurrency  protocol  (CC)  which  only  requires  remote  access  only  when  actiuiring  exclusive 
locks,  i.e.  the  concurrency  control  (C'(l)  protocol  is  optimized  for  read-only  sharing  so  that  no  remol<‘ 
calls  are  made  for  CC  for  the  replicated  item  relation.  If  a  protocol  optimized  for  write  sharing  were 
used,  the  performance  would  drop  considerably.  For  instance  if  the  primary  copy  prottjcol  [2]  were 
used  for  replication,  there  would  be  little  performance  gain  over  the  non-replicated  systtun  since  locks 
would  have  to  be  acquired  remotely  for  each  access. 

The  TPC-C  benchmark  specifies  that  for  each  item  ordered  in  the  New-Order  transaction  only 
1%  are  stocked  by  a  remote  warehouse.  In  addition,  the  benchmark  specifies  that  l')%  of  customers 
making  payment  via  the  Payment  transaction  are  making  the  payment  through  a  remote  warehouse. 
These  specifications  result  in  a  very  low  percentage  of  remote  calls  and  hence  the  good  scale- ups 
shown  for  the  replicated  case  shown  in  Figure  11.  We  now  examine  the  sensitivity  of  the  results 
to  this  assumption.  In  Figure  12  we  plot  the  maximum  throughput  versus  the  number  of  nodes  for 
different  probabilities  of  ordering  items  stcc'  ed  by  a  remote  warehou.se  in  the  new  order  transaction. 
We  see  that  if  the  probability  of  remotely  stocked  items  increases  to  1.0  the  scale-up  decreases  by 
about  44%.  Note  that  even  at  a  probability  of  remotely  stocked  items  of  1.0,  most  of  the  accessi 
are  still  local  since  only  43%  of  the  transactions  are  New-Order  transactions,  and  of  these  only  the 
ten  stock  tuples  selected  are  remote;  the  warehouse,  customer,  district,  and  10  item  tuples  selectiotis 
are  all  local.  The  TPC-C  benchmark  favors  distributed  systems  by  having  a  very  small  percentage  of 
remote  calls. 

6  Summary  and  Conclusion 

In  this  paper  we  modelled  the  TPC-C  benchmark  for  single  node  and  multiple  node  distributed 
database  systems.  One  key  difference  of  the  TPC-C  benchmark,  from  the  debit-credit  benchmark 
of  TPC-A,  is  that  it  includes  significant  skew  (i.e.,  non-uniform  access)  within  several  key  relations. 
By  contrast,  the  TPC-A  benchmark  has  uniform  access  within  each  relation,  and  in  particular,  each 
account  in  the  large  account  relation  is  accessed  with  equal  probability.  As  a  consequence,  in  TP( '-.A 
each  account  tuple  is  accessed  infrequently  and  it  is  not  beneficial  to  hold  them  in  a  memory  bufl'er. 
Therefore,  one  focus  of  this  paper  was  to  quantify  the  access  skew  in  the  TPC-C  l)enchmark,  and  to 
examine  it’s  impact  on  the  optimal  sy.stem  configuration,  price-performance  and  scalability. 

To  this  end,  we  first  quantified  the  tuple  data  access  skew  as  specified  in  the  benchmark.  Consider 
the  stock  relation  as  an  example  for  quantifying  the  access  skew.  ,\f  the  tuple  level  we  found  that 
about  84%  of  the  acce.s.ses  go  to  about  20%,  of  the  hottest  stock  tuples.  I'lH'rt'  is  even  more  skew  iti 
the  tail  of  the  distribution,  sc.)  that  about  30%  of  the  accesses  go  to  about  2%,  of  the  (hottest)  tu[»les. 
Since  the  database  buffer  is  typically  organized  as  pages,  we  next  examined  the  skew  at  the  |)age 
level.  If  tuples  are  inserted  sequentially  by  key  {c)r  randomly)  then  hot  tuples  are  scattert'd  among 
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the  pages  in  the  database.  As  a  consequence,  the  skew  at  the  page  It'vel  is  milder  tlian  that  at  the 
tuple  level.  Specifically,  about  75%  of  the  accesses  go  the  hottest  20%  of  the  pages.  Again,  tin  re  is 
a  more  skew  in  the  tail  of  the  distribution  and  and  about  28%  of  the  accesses  go  to  about  2%  of  tlie 
pages.  VVe  then  considered  clustering  the  ho  uples  into  the  same  pages  in  an  optimal  manner.  This 
is  possible  for  the  TPC-C  benchmark  becau.se  the  acce.ss  probabibtics  are  static  in  time  and  known 
a-priori.  If  this  were  done,  the  resulting  skew  at  the  page  level  is  about  the  same  as  that  at  the  tuple 
level,  in  term'-  of  the  fraction  of  accesses  that  go  to  any  specific  fraction  of  data. 

Having  quantified  the  access  skew,  we  examined  the  buffer  hit  ratio  versus  buffer  size  character¬ 
istic,  assuming  an  LRU  buffer  replacement  policy.  We  quantified  this  for  each  relation,  both  for  the 
case  of  sequential  assignment  of  tuples  to  pages  and  for  that  with  hot  tuples  clustered  within  pages. 
Significant  differences  in  the  buffer  hit  ratio  was  found  for  these  two  cases.  The  specific  hit  ratios 
and  the  difference  for  the  two  cases  differs  for  different  relations.  In  absolute  terms  it  is  largest  for 
the  customer  relation,  but  the  higher  frequency  of  acce.ss  to  the  stock  relation  makes  this  relation 
dominant. 

The  results  of  the  buffer  model  were  fed  to  a  throughput  model  to  examine  the  overall  throughput 
and  optimal  memory  and  disk  configuration.  The  access  skew  makes  the  results  rather  different  from 
that  for  the  TPC-A  benchmark  where,  as  outlined  above,  buffering  any  of  the  account  tuples  is  of 
little  value.  For  the  TPC-C  case,  almost  all  the  item  tuples,  the  hotter  stock  tuples,  and  some  of 
the  customer  tuples  are  buffered  in  the  estimated  optimal  configurations.  The  optimal  configurations 
depend  on  the  specific  costs  of  disks  and  memory,  specific  estimates  are  given  in  Section  5.2. 

We  also  found  that  depending  on  the  disk  bandwidth  to  disk  storage  capacity  ratio,  packing  hot 
tuples  into  pages  may  result  in  significant  benefits  in  terms  of  price-performance.  We  note,  however, 
that  this  observation  applies  only  to  a  workload  where  the  access  probabilities  do  not  vary  with  time, 
and  where  they  are  known  a-priori.  In  this  sense,  the  TPC-C  benchmark  is  not  (piite  representative 
of  many  real  workloads,  where  often  neither  of  these  conditions  apply. 

Finally,  we  examined  the  scalability  of  the  TPC-C  workload  in  terms  of  how  the  throughput 
can  be  expected  to  grow  with  the  number  of  nodes  in  a  distributed  database  sy.stem.  Like  the 
TPC-A  benchmark,  the  TPC-C  benchmark  is  largely  partitionable,  and  close  to  linear  scale-up  in 
the  number  of  nodes  can  bo  obtained.  This  a.ssumes  that  the  read-only  item  relation  is  replicated 
across  all  nodes,  and  that  ricj  remote  communication  is  neculed  for  concurrericy  control  for  acce.ss 
to  this  read-only  relation.  Specifically,  if  the  Item  relation  is  replicated,  there  are  h'w  remote  calls 
in  the  workload.  In  the  New-Order  transaction  on  average  0.1  stock  tu])les  accessc'd  and  u])i!aie(l 
are  from  a  remote  warehouse.  .Since  the  New-Order  transaction  selects  25  tuples  thesf'  0.1  rfunote 
calls  comprise  only  0.4%.  of  the  New-Order  transaction  workload.  In  the  PaviiHuil  transaction  0.55 
(0.15  X  2.2)  custcmier  tuples  accessed  are  from  and  npdaterl  are  from  a  remote'  warc'house.  Since'  the' 
Payment  tra,nsaction  selects  4,2  tuples  these  0.55  remote  calls  cemiprise  only  7.9%  of  (he  Payme'iit 
wejrkloarl.  1  he  Order-Status,  Delivery,  and  .Stock-Level  transactions  access  11.4.  150,  and  401  1u|)!e's 
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respectively.  Hence,  once  weighted  by  the  percentage  of  the  workload  only  of  the  aro'SM's  are 

to  remote  data.  This  low  fraction  of  remote  access  should  be  carefully  considered  when  using  tin' 
TPC-C  benchmark  to  as.sess  the  performance  of  a  distributed  or  clustered  datat)ase  systn'in. 

In  a  real  environment,  the  item  relations  would  be  updated  albeit  infrequently,  and  provision 
would  have  to  be  made  for  this.  If  a  general  concurrency  control  protocol  was  used  for  this,  e.g.  tlie 
primary  copy  approach,  or  if  the  item  relation  is  not  replicated,  then  the  scale-up  as  a  function  of  the 
number  of  nodes  is  significantly  lower,  as  we  have  quantified.  Even  so,  the  fraction  of  remote  calls 
is  rather  small.  While  we  have  focussed  on  examining  the  TPC-C  benchmark,  the  methodology  we 
have  used  has  more  general  applicability. 
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Appendix  A  Derivation  of  probabilities  for  throughput  model 


In  this  appendix  we  derive  the  expected  number  of  remote  re<juests  and  \inique  sites  involved  for  a 
distributed  system.  These  terms  are  used  in  Section  5. 
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\\V  iirst  (If'rivc  tlie  probahilii 'k's  assmiiini'  iho  Item  r<'lat‘niii  is  rcplicai^'d  and  thi’n  dnrivn  i  In- 
])robahiliti('s  assuming  iio  r('])iicat inn. 

Appendix  A.l  Item  relation  is  replicated 

When  the  ilom  relation  is  replicated  rerpiests  for  item  tuples  are  always  !o<  al.  i  he  otily  reniote 
accesses  possibly  needed  are  for  stock  tuples  by  the  NevvOrder  transaction  and  for  customer  tuples  by 
the  Payment  transaction.  We  first  consider  the  NewOrder  transaction.  Fhe  .XewOrder  t ratisaction 
requests  10  stock  tujjles,  each  ttiple  belonging  to  a  remote  warehouse  with  proltability  0.01  as  s[)erified 
by  the  benchtnark.  As.suine  there  are  A’  nodes  in  the  system.  Let  /’[.S’j]  be  the  probability  that  j  of 
the  10  stock  tuples  accessed  are  remote. 

where  Ps  -  0.01  S  is  the  number  of  nodes  in  the  system.  The  term  0.01  is  the  probability 

that  an  individual  stock  tuple  is  from  a  remote  warehouse,  and  probability  that  the  remote 

warehouse  is  located  on  a  remote  node.  We  make  the  simplifying  assumption  that  requests  to  remote 
w'arehou.ses  located  on  the  same  node  require  the  same  overhead  as  a  local  request. 

Let  be  the  expected  number  of  remote  stock  tuples  retrieved  made  by  the  .N'ew-Order  transac¬ 

tion. 

10 

r=o 

Elach  tuple  retrieved 's  also  update,  hence  the  expected  total  number  of  remote  calls  by  the  .NewOrder 
transaction  for  reading  and  writing  stock  tuples  is 

RCsi„ek^2  '<  K[R,]  (ti) 

Let  Lstork  be  the  probability  that  all  stock  tuples  are  referenced  lf>ca!ly. 

=  -  I's-)'”  (T) 

The  number  of  remote  sites  involved  in  the  transaction  is  the  number  of  uni(pie  sites  from  widt  h 
stork  tnple.s  are  tibtained.  We  derive  this  expectation,  in  tlie  following  theortun. 


Theorem: 


0 stock  =  E-=0  -  1) 


Proof: 

Assume  the  system  has  N  nodes,  and  that  a  site  generates  j  remote  requests.  With  out  loss  of 
generality,  assume  the  originating  site  to  be  node  1. 

Let  i  6  (2 . . .  iV)  be  an  indicator  variable  for  the  event  that  node  i  supplies  at  least  one  tuple.  A 
remote  request  is  satisfied  by  one  of  the  A"  —  1  nodes  with  equal  probability,  hence  the  probability 
that  a  node  supplies  at  least  one  tuple  (the  probability  that  the  indicator  variable  is  1)  is 


The  expected  number  of  unique  sites  supplying  tuples  is 

Unconditioning  on  the  number  of  remote  requests,  j,  results  in  the  expected  number  of  unique  sites; 


We  now  derive  the  expectations  for  the  Payment  transoction.  The  only  remote  acces.ses  are  for 
tuples  from  the  customer  relation.  The  customer  is  from  a  remote  warehouse  with  probability  O.l-a. 
The  customer  is  selected  based  on  customer-id  40%  of  the  time  (hence  one  tuple  is  seiected).  ai  d 
based  on  cmstomer-name  60%;  of  the  time  (hence  three  tuples  are  selected).  In  addition,  once  the 
tuple  has  been  selected  the  update  must  be  written  back  to  the  remote  node.  Hence,  the  expected 
number  of  remote  calls  for  obtaining  and  updating  customer  tuples,  RCcust- 

R^-cust  =  O.l.u  l(0.4)(  1)  -k  (0.f)){:J)  +  1]  (.S) 

At  most  one  remote  site  may  be  involved  and  hence  the  expected  number  of  uni(pH>  remote-  siti-s  from 
which  customer  tuples  are  obtained,  is: 


l‘,usi  ^  OAr,  (J^-y  I!)) 

Appendix  A. 2  Item  relation  not  replicated 

VVe  now  derive  the  expectations  assuttiing  the  item  relation  is  not  replicated,  'i'he  ex|)ectationh  for 
the  Paynienl  transaction  are  the  same  as  for  the  replicated  case  sitiee  the  Payment  transaction  docs 
not  access  the  Item  relation. 

For  the  NcwOrder  transaction  the  nnmber  of  remote  calls  for  stock  tuples.  i:.  expected 

number  of  unique  sites  supplyin;,  stock  tuples,  Ugtock^  probability  that  all  stock  tuples  are 

.supplied  locally  are  the  same  as  when  the  item  relation  is  replicated.  The  difference  from  the  replicated 
case  is  that  the  10  item  tuples  retrieved  may  be  remote  since  we  assume  the  item  relation  is  uniformly 
di.stribiited  among  the  N  nodes. 

Let  P[/j]  be  the  probability  that  j  of  the  10  item  tuples  accesses  are  remote. 

)  (PiY  (1  -  (!0) 

where  Pj  =  is  the  probability  that  an  item  tuple  is  located  on  a  remote  node,  and  A'  i.s  the 
number  of  nodes  in  the  system.  Let  E[Ri]  be  the  expected  number  of  remote  item  tuples  retrieved 
made  by  the  New-Order  transaction. 

10 

}=o 

The  number  of  remote  calls  for  item  tuples,  /?C, is  equal  to  E[R  i]  since  the  tuples  are  not  updated. 

Let  (JuejH  Le  the  expected  number  of  unique  remote  .sites  involved  for  fetching  the  remote  item  tuples. 
This  expectation  is  derived  as  in  theorem  1. 

f  /  A'  —  2\  ■' 

=i:/'ici(iv-i)  |i  -  (^i)j  (I-.!) 

In  addition  to  Unlock  fi.nd  wc;  need  the  expected  total  niimlier  of  uni((ue  nrdes  referenced  (he 

NewOrder  transaction,  fJstnck+,trm-  "Lhe  ex[)ecled  number  of  unique  sites  given  j  stork  t  uple  roipiests 
and  k  item  tuple  requests  is  equal  to 

2.') 


Hence,  upon  unconditioning  on  j  and  k, 

10  10 

I'"  stock-^itern  —  EE  c[y/>M(.v-i) 
j=0  k=0 

Appendix  A. 3  Proof  of  Perodicity  of  the  NURand  Function 

In  this  appendix  we  show  that  the  NURand(x,0,y)  function  is  peridoc  if  both  x  and  y  are  a  power  of 

2,  X  <  y.  Although  the  function  is  not  exactly  peridoc  when  y  is  not  a  power  of  2.  we  have  observed 

it  to  be  close  to  periodic. 

NU Rand{x,0,y)  —  {{Tandoni{Q,x)  \  random[0,y))%y)  (11) 

where 

random(x,y)  denotes  a  uniformly  distributed  integer  random  number  in  the  closed  interval  [x..y] 
(N  %  M)  stands  for  N  modulo  M 
(N  I  M)  stands  for  the  bitwise  logical  OR  of  N  and  M 
Let  a;  =  2“  -  1  and  y  =  2^  -  1,  6  >  a 
Let  z  =  b  -  X, 

Let  A  =  /U-i  A{,_2  . . .  Aq  be  the  binary  representation  of  the  number  drawn  from  randoni(O.x). 

Let  B  =  Bf-i  Bb-2  ■  ■  ■  Bohe  the  binary  representation  of  the  number  drawn  from  random(O.y). 

Note  that  if  A  >  0,  then  the  top  z  bits  of  A  will  all  be  zero. 

Let  PlA,]  denote  the  probability  that  bit  A,  is  set  to  one.  Then, 

F[A.]=  i,t€(0,l,...(fz-l)) 

B\A,]  =  0,  f  >  a 

r[B,]  =  (0,1, ...(/>-  1)) 

Let  ('  ~  A\B  =  Cit-t  Ch-2  ■■  -Lo-  Since  A,  and  B,  are  independent  for  all  i,  liit  (\  is  sei  if  either  -1, 
or  B,  or  both  are  set.  Hence, 


20 


/'K']=  2,,  g  (0,...(a  -  D)  F[C]=  1)) 

Thus,  the  the  probability  of  acce.ssing  a  specifir  tnplo-id  generated  from  the  N  I’ Raiidfx.O.y )  fum  tion 
is  )'  where  i  equal  the  number  of  non-zero  bits  in  the  low  a  bit.  j  is  the  number  of  zero  !)its 

in  the  low  n  bits,  and  z  hs  as  defined  above.  Hence,  the  probability  mass  function  i.s  periodic  where 
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Customers 

Figure  1:  TPC-C  Business  Enviornment. 

Reproduced  with  permission  from  the  TPC 


Figure  2:  TPC-C  Entity/Relationship  Diagram. 

Reproduced  with  permission  from  the  TF*(’ 
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Figure  3:  Stock  Relation  PMF 
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Figure  5;  Stock  Relation  CDF 
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Figure  7:  Customer  Relation  CDF 
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Figure  10:  Price  Performance 
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Figure  11:  Scaleup  of  TPC-C 
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maximum  throughput  (tpm) 


Figure  12:  Sensitivity  to  Percent  Remote 
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pla' forms  running  a  medium  complexity  transaction  processing  workload  Some  key  aspects  in  which  this  new- 
benchmark  differs  from  the  TPC-A  benchmark  are  in  having  several  transaction  types,  some  of  which  are  more 
complex  than  that  in  TPC-A,  and  in  having  data  access  skew  In  this  paper  we  present  results  from  a  modelling 
study  of  the  TPC-C  benchmark  for  both  single  node  and  distributed  database  management  systems.  We  simulate 
the  TPC-C  workload  to  determine  expected  buffer  miss  rates  assuming  an  LRU  buffer  management  policy  These 
miss  rates  are  then  used  as  inputs  to  a  throughput  model.  (.From  these  models  we  show  the  following:  (i)  We 
quantify  the  data  access  skew  as  specified  in  the  benchmark  and  show  what  fraction  of  the  accesses  go  to  what 
fraction  of  the  data,  (ii)  W'e  quantify  the  resulting  buffer  hit  ratios  for  each  relation  as  a  function  of  buffer  size 
(iii)  We  show  that  close  to  linear  scale-up  (about  3%  from  the  ideal)  can  be  achieved  in  a  distributed  system,  assuming 
replication  of  a  read-only  table,  (iv)  VVe  examine  the  effect  of  packing  hot  tuples  into  pages  and  show  that  significant 
price/perform  mce  benefit  can  he  thus  achieved,  (v)  Finally,  by  coupling  the  buffer  simulations  with  the  throughput 
model,  we  ex  imine  typical  disk/memory  configurations  that  maximize  the  overall  pricc/performance. 
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